6,988 research outputs found

    Expanding the Role of Synthetic Data at the U.S. Census Bureau

    Get PDF
    National Statistical offices (NSOs) create official statistics from data collected directly from survey respondents, from government administrative records and from other third party sources. The raw source data, regardless of origin, is usually considered to be confidential. In the case of the U.S. Census Bureau, confidentiality of survey and administrative records microdata is mandated by statute, and this mandate to protect confidentiality is often at odds with the needs of data users to extract as much information as possible from rich microdata. Traditional disclosure protection techniques applied to resolve this tension have resulted in official data products that come no where close to fully utilizing the information content of the underlying microdata. Typically, these products take for the form of basic, aggregate tabulations. In a few cases anonymized public-use micro samples are made available, but these are increasingly under risk of re-identification by the ever larger amounts of information about individuals and firms that is available in the public domain. One potential approach for overcoming these risks is to release products based on synthetic or partially synthetic data where values are simulated from statistical models designed to mimic the (joint) distributions of the underlying microdata rather than making the actual underlying microdata available. We discuss recent Census Bureau work to develop and deploy such products. We also discuss the benefits and challenges involved with extending the scope of synthetic data products in official statistics

    Brand Capital and Incumbent Firms\u27 Positions in Evolving Markets

    Get PDF
    In many advertising-intensive industries one observes market share persistence, i.e., firms maintaining lead market shares over long periods of time. I hypothesize that firms that have the largest stock of well-established brands, a stock that I term brand capital, are most likely to introduce new products in response to new market information about consumer preferences. Firms with less brand capital delay their introductions until the uncertainty concerning the market size is reduced. I present empirical support in a study of new product introductions in the U.S. beverage industry

    Force-induced rupture of a DNA duplex

    Full text link
    The rupture of double-stranded DNA under stress is a key process in biophysics and nanotechnology. In this article we consider the shear-induced rupture of short DNA duplexes, a system that has been given new importance by recently designed force sensors and nanotechnological devices. We argue that rupture must be understood as an activated process, where the duplex state is metastable and the strands will separate in a finite time that depends on the duplex length and the force applied. Thus, the critical shearing force required to rupture a duplex within a given experiment depends strongly on the time scale of observation. We use simple models of DNA to demonstrate that this approach naturally captures the experimentally observed dependence of the critical force on duplex length for a given observation time. In particular, the critical force is zero for the shortest duplexes, before rising sharply and then plateauing in the long length limit. The prevailing approach, based on identifying when the presence of each additional base pair within the duplex is thermodynamically unfavorable rather than allowing for metastability, does not predict a time-scale-dependent critical force and does not naturally incorporate a critical force of zero for the shortest duplexes. Additionally, motivated by a recently proposed force sensor, we investigate application of stress to a duplex in a mixed mode that interpolates between shearing and unzipping. As with pure shearing, the critical force depends on the time scale of observation; at a fixed time scale and duplex length, the critical force exhibits a sigmoidal dependence on the fraction of the duplex that is subject to shearing.Comment: 10 pages, 6 figure

    OPTIMAL PROPENSITY SCORE STRATIFICATION

    Get PDF
    Stratifying on propensity score in observational studies of treatment is a common technique used to control for bias in treatment assignment; however, there have been few studies of the relative efficiency of the various ways of forming those strata. The standard method is to use the quintiles of propensity score to create subclasses, but this choice is not based on any measure of performance either observed or theoretical. In this paper, we investigate the optimal subclassification of propensity scores for estimating treatment effect with respect to mean squared error of the estimate. We consider the optimal formation of subclasses within formation schemes that require either equal frequency of observations within each subclass or equal variance of the effect estimate within each subclass. Under these restrictions, choosing the partition is reduced to choosing the number of subclasses. We also consider an overalll optimal partition that produces an effect estimate with minimum MSE among all partitions considered. To create this stratification, the investigator must choose both the number of subclasses and their placement. Finally, we present a stratified propensity score analysis of data concerning insurance plan choice and its relation to satisfaction with asthma care

    REGRESSION ADJUSTMENT AND STRATIFICATION BY PROPENSTY SCORE IN TREATMENT EFFECT ESTIMATION

    Get PDF
    Propensity score adjustment of effect estimates in observational studies of treatment is a common technique used to control for bias in treatment assignment. In situations where matching on propensity score is not possible or desirable, regression adjustment and stratification are two options. Regression adjustment is used most often and can be highly efficient, but it can lead to biased results when model assumptions are violated. Validity of the stratification approach depends on fewer model assumptions, but is less efficient than regression adjustment when the regression assumptions hold. To investigate these issues, by simulation we compare stratification and regression adjustments. We consider two stratification approaches; equal frequency classes and an approach the attempts to minimize the mean squared error (MSE) of the treatment effect estimate. The regression approach we consider is a Generalized Additive Model (GAM), that flexibly estimates the relations among propensity score, treatment assignment, and outcome. We find that, under a wide range of plausible data generating distributions, the GAM approach outperforms stratification in treatment effect estimation with respect to bias, variance, and thereby MSE. We illustrate approaches via analysis of data on insurance plan choice and its relation to satisfaction with asthma care

    EFFICIENT EVALUATION OF RANKING PROCEDURES WHEN THE NUMBER OF UNITS IS LARGE WITH APPLICATION TO SNP IDENTIFICATION

    Get PDF
    Simulation-based assessment is a popular and frequently necessary approach to evaluation of statistical procedures. Sometimes overlooked is the ability to take advantage of underlying mathematical relations and we focus on this aspect. We show how to take advantage of large-sample theory when conducting a simulation using the analysis of genomic data as a motivating example. The approach uses convergence results to provide an approximation to smaller-sample results, results that are available only by simulation. We consider evaluating and comparing a variety of ranking-based methods for identifying the most highly associated SNPs in a genome-wide association study, derive integral equation representations of the pre-posterior distribution of percentiles produced by three ranking methods, and provide examples comparing performance. These results are of interest in their own right and set the framework for a more extensive set of comparisons
    • …
    corecore